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Abstract 

Despite its functional conservation, the mitochondrial genome (mtDNA) presents strikingly different features among eukaryotes, such 
as size, rearrangements, and amount of intergenic regions. Nonadaptive processes such as random genetic drift and mutation rate 
play a fundamental role in shaping mtDNA: the mitochondrial bottleneck and the number of germ line replications are critical factors, 
and different patterns of germ line differentiation could be responsible for the mtDNA diversity observed in eukaryotes. Among 
metazoan, bivalve mollusc mtDNAs show unusual features, like hypervariable gene arrangements, high mutation rates, large amount 
of intergenic regions, and, in some species, an unique inheritance system, the doubly uniparental inheritance (DUI). The DUI system 
offers the possibility to study the evolutionary dynamics of mtDNAs that, despite being in the same organism, experience different 
genetic drift and selective pressures. We used the DUI species Ruditapes philippinarum to study intergenic mtDNA functions, mito- 
chondrial transcription, and polymorphism in gonads. We observed: 1 ) the presence of conserved functional elements and novel open 
reading frames (ORFs) that could explain the evolutionary persistence of intergenic regions and may be involved in DUI-specific 
features; 2) that mtDNA transcription is lineage-specific and independent from the nuclear background; and 3) that male-transmitted 
and female-transmitted mtDNAs have a similar amount of polymorphism but of different kinds, due to different population size and 
selection efficiency. Our results are consistent with the hypotheses that mtDNA evolution is strongly dependent on the dynamics of 
germ line formation, and that the establishment of a male-transmitted mtDNA lineage can increase male fitness through selection on 
sperm function. 

Key words: doubly uniparental inheritance, mitochondrial intergenic regions, novel mitochondrial ORFs, germ line mitochon- 
dria, mitochondrial polymorphism, CORR. 



Introduction 

Since the symbiosis event that originated the eukaryotic cell, 
mitochondria underwent a massive process of genome reduc- 
tive evolution (GRE) (Andersson and Kurland 1998; Khachane 
et al. 2007). The protomitochondrion (most likely an alpha- 
proteobacterion, for details see Muller and Martin 1999; 
Atteia et al. 2009; Abhishek et al. 201 1; Thrash et al. 201 1) 
lost the majority of its genome in a short evolutionary time, 
before the split of eukaryotic lineages, about 1,200 Ma 
(Khachane et al. 2007). After that, mitochondria coevolved 



with different hosts and experienced both neutral modifica- 
tions and adaptive responses that led to the diversity that we 
observe today in mitochondrial genomes (mtDNAs) (Embley 
and Martin 2006). The most radical difference is between land 
plants and animals: plant mtDNAs are large and rich in non- 
coding sequences, while animal mtDNAs are more compact 
and much smaller. According to the mutation pressure 
theory (Petrov 2001; Lynch et al. 2006, 2011; Lynch 2007) 
genome evolution is shaped by mutation rate and random 
genetic drift. Nonfunctional intergenic DNA is mutationally 
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hazardous because, while it cannot suffer from loss-of- 
function mutations, it can be the substrate for gain-of- 
function deleterious mutations (Lynch et al. 2006, 2011; 
Lynch 2007). Thus, genomes with a high mutation rate are 
subject to a more intense selection for GRE, but the efficiency 
of this selection is determined by the amount of random ge- 
netic drift (i.e., effective population size, A/ e ). In taxa with re- 
duced N ei selection against the accumulation of nonfunctional 
DNA is less effective, and that would be the reason for the 
observed genome expansion during eukaryote evolution 
(Lynch 2007). As random genetic drift in plants and animals 
is similar, the difference in mitochondrial genome size can be 
explained by the much lower (~100x) mutation rate in 
plant mtDNAs compared with animal mtDNAs (Lynch et al. 
2006). 

In animal mitochondria, genomic features such as mutation 
rate, gene content, genome architecture, compositional prop- 
erties, and gene strand asymmetry are variable among taxa, 
reflecting their different evolutionary histories (Gissi et al. 
2008). A large number of studies attempted to unveil the 
reasons behind the different mutation rates among animal 
mtDNA lineages, investigating the relationship between 
such rates and body mass, metabolic rate, reactive oxygen 
species (ROS) production, and lifespan (see Galtier et al. 
2009a for an overview). The matter remains unsolved, but 
there is clear evidence for a leading role of DNA replication 
on base-substitution mutations: despite its proof-reading 
function, most mutations arise from DNA polymerase errors 
(Drake et al. 1 998; Lynch et al. 2006). Following this rationale, 
most of the heritable mutations are accumulated during germ 
line proliferation, when germ cells undergo several rounds of 
replication, and this implies that reproduction mode and 
gonad physiology affect evolutionary rates, as suggested by 
several authors (Rand 2001; Davison 2006). For example, in 
bivalve molluscs gametes are formed by proliferation of ger- 
minal cells in acini (Devauchelle 1990), the gonadic units con- 
taining the germinative tissue that lines the acinus wall. The 
gonad develops until it becomes fully mature then, after one 
or more spawning events, it is depleted. At the beginning of 
the following reproductive season, the spent gonad under- 
goes a period of reconstitution, and the cycle starts again 
(Gosling 2003). It follows that in bivalves the number of cell 
divisions in germ line does not show a marked asymmetry 
between males and females in contrast with what happens, 
for example, in mammals (Davison 2006). This feature, to- 
gether with the production of an extremely large number of 
gametes due to broadcast spawning, implies a large number 
of cell divisions in both germ lines, resulting in a higher mu- 
tation rate in comparison to species that show male-driven 
evolution (Ellegren 2007). Actually, bivalves show an extraor- 
dinary amount of nucleotide polymorphism in both mitochon- 
drial and nuclear genomes (Saavedra and Bachere 2006), and, 
in sharp contrast with deuterostomes which have almost in- 
variant mitochondrial gene order (Gissi et al. 2008, but see 



Gissi et al. 2010 for an exception), bivalves present highly 
rearranged mtDNAs, even at the intra-genus level. The asso- 
ciation between polymorphism and gene order variability is 
not surprising: it is well established that sequence evolution 
and genome rearrangement are positively correlated (Begun 
and Aquadro 1992; Shao et al. 2003; Xu et al. 2006; Koonin 
2009), even if the reasons behind this are still object of a 
heated debate (Begun and Aquadro 1992; Charlesworth 
et al. 1993; Nachman 2001). What is more surprising is the 
association, in bivalve mtDNAs, of a high mutation rate with 
the presence of quite large mitochondrial genomes. 

An even more interesting feature of bivalves is the presence 
of an unusual mitochondrial inheritance system: the doubly 
uniparental inheritance (DUI; Skibinski et al. 1994; Zouros 
et al. 1994). So far, DUI has been detected in 46 bivalve spe- 
cies (Theologidis et al. 2008; Breton et al. 201 1b), belonging 
to seven families. In DUI species, two mtDNAs are present: one 
is transmitted through eggs (F-type, for female-inherited), the 
other through sperm (M-type, for male-inherited), and the 
divergence between conspecific M and F genomes ranges 
from 10% to over 50% (see Breton et al. 2007 and Zouros 
2012 for reviews). 

In this work, we analyzed the mtDNAs of the DUI species 
Ruditapes philippinarum (Manila clam). The complete M and F 
genomes of R. philippinarum were submitted to GenBank in 
2001 by Okazaki and Ueshima (Accession Nos.: AB065374 
and AB065375, respectively), but a detailed characterization 
has not been published so far. We Sanger-sequenced the M 
and F major unassigned regions (URs), identifying the control 
regions (CRs) as well as motifs and secondary structures at 
both DNA and RNA level. Then, we obtained the M-type 
and F-type transcriptomes by RNA-Seq on lllumina GA llx 
platform and performed a single-nucleotide polymorphism 
(SNP) analysis. Our main objectives were to 1) identify 
conserved functional elements and novel open reading 
frames (ORFs) that could explain the evolutionary persistence 
of intergenic regions in this species and other bivalves with 
DUI, 2) test, for the first time, if the mtDNA transcription in 
bivalves with DUI is lineage-specific and/or independent 
from the nuclear background, and 3) verify whether the 
male-transmitted and female-transmitted mtDNAs have a sim- 
ilar amount of polymorphism, and investigate the type of 
molecular variation occurring in the two mitochondrial 
lineages. 

On a more general level, DUI systems can help understand- 
ing the complex relationship among multiple levels of selec- 
tion and complex population dynamics that underlay 
mitochondrial genome evolution. Our data support the hy- 
pothesis that mtDNA evolution is strongly dependent on the 
dynamics of germ line formation, and suggest that the estab- 
lishment of a male-transmitted mtDNA lineage can be bene- 
ficial, increasing male fitness through selection on sperm 
function. 
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Materials and Methods 

Proportion of URs in Mitochondrial Genomes of 
Metazoans 

In February 2012, 2,656 complete mitochondrial genomes 
were downloaded from the MitoZoa database release 10 
(http://mi.caspur.it/mitozoa/ [last accessed August 2, 2013], 
Lupi et al. 201 0; D'Onorio de Meo et al. 201 2), and analyzed 
with custom Unix and R scripts to obtain the data shown in 
table 1 . Given the marked difference in sample size among 
animal groups, to improve statistical power, we included in 
the analysis only taxa for which more than 60 complete mi- 
tochondrial genomes were available. 

Gamete Collection and DNA Extraction 

Gametes were collected from seven males and eight females 
using the procedure described in Ghiselli et al. (201 1). Sperm 
samples were purified using a Percoll (GE Healthcare) gradi- 
ent, as in Venetis et al. (2006). Egg samples were collected and 
centrifuged, then seawater was replaced with absolute etha- 
nol. Total DNA was extracted from gametes with the DNeasy 
(Qiagen) and the MasterPure Complete DNA and RNA 
Purification Kit (Epicentre). 

Polymerase Chain Reactions and Sequencing 

DNA extractions were used as template for the polymerase 
chain reactions (PCRs): sperm extractions were used to obtain 
male largest unassigned region (MLUR) and male unassigned 
region 21 (MUR21) sequences, whereas eggs extractions for 
female unassigned region 21 (FUR21) and female largest 
unassigned region (FLUR). Primers were designed with 
Primer 3 (Rozen and Skaletsky 2000) on the complete R. phi- 
lippinarum M and F mitochondrial genomes present in 
GenBank (Accession Nos.: AB065374-5; Okazaki M and 
Ueshima R, unpublished data). Primer pairs and their se- 
quences are enlisted in supplementary table S1, 
Supplementary Material online. PCR amplifications were per- 
formed on a 2720 Thermal Cycler (Applied Biosystems) in a 
50jiL reaction volume using the GoTaq Flexi Dna Polymerase 



(Promega) kit. The reaction volume was composed of 24 jiL of 
Nuclease-free Water (Ambion Inc.), 10jil_ of Green GoTaq 
Flexi Buffer 5x (Promega), 6jiL of MgCI 25 mM, 1 jiL of 
dNTPs (Promega) mix 40jiM (10jiM each dNTP), 2.5 jiL of 
each primer (10jiM) (Invitrogen SRL), 4jiL of DNA template 
and 0.25 jiL of GoTaq DNA Polymerase (Promega) 5U/jil_. 
PCRs were performed with the following cycle: an initial de- 
naturation at 95 °C for 2 min, followed by 30 cycles of dena- 
turation at 95 °C for 30s, annealing at 48 °C for 30s and 
extension at 72 °C for 90s, then a final extension at 72 °C 
for 5 min. Every PCR product was checked by agarose gel 
electrophoresis. PCR products were purified using the 
Wizard SV Gel and PCR clean-up system (Promega) kit or 
the GenElute PCR clean-up kit and the GenElute Gel extrac- 
tion kit (Sigma-Aldrich), following the manufacturer instruc- 
tions. Sequencing was performed by Macrogen Inc. (Seoul, 
South Korea). Sequences were checked, aligned, and assem- 
bled manually using MEGA5 (Tamura et al. 201 1). 

Annotation of LURs 

Ruditapes philippinarum largest unassigned regions (LURs) 
structure was defined using blastn (http://blast.ncbi.nlm.nih. 
gov/Blast.cgi, last accessed August 2, 2013) and with 
manual alignments. Repeat units were identified with 
Tandem Repeats Finder (http://tandem.bu.edu/trf/trf.html, 
last accessed August 2, 2013) (Benson 1999) and Repeat 
Finder (http://www.proweb.org/proweb/Tools/selfblast.html, 
last accessed August 2, 2013). ORFs in MUR21 and FLUR 
were identified with ORF Finder (http://www.ncbi.nlm.nih. 
gov/gorf, last accessed August 2, 201 3) using the invertebrate 
mitochondrial genetic code. 

Conserved Motifs 

A search for conserved sequence motifs in R. philippinarum mt 
LURs and in 9 other Veneroid mt LURs (supplementary table 
S2, Supplementary Material online) was performed with 
MEME (Multiple Em for Motifs Elicitation; http://meme.nbcr. 
net/meme/cgi-bin/meme.cgi, last accessed August 2, 2013) 
(Bailey et al. 2009). The found motifs were submitted to 



Table 1 

Proportion of URs in the Mitochondrial Genomes of Metazoans 



Taxa 


N 


Median Total Length 


Median URs Length 


Median %cod 


Median %URs 


Significance 


Metazoa 


2,656 


16,544 


1,047 


93.4 


6.6 


n.s. 


Chordata 


1,852 


16,606 


1,062 


93.6 


6.4 


n.s. 


Arthropoda 


415 


15,587 


945 


93.9 


6.1 


n.s. 


Nematoda 


66 


13,972 


843 


94.0 


6.0 


n.s. 


Mollusca 


134 


16,195 


1,311 


91.9 


8.1 


n.s. 


Gastropoda 


49 


15,129 


258 


98.3 


1.7 


*** 


Bivalvia 


64 


16,898 


1,886 


88.8 


11.2 


*** 



Note. — N, sample number; median total length, median total length of the mitochondrial genome; median URs length, median total length of the 
URs; median %cod, median proportion of coding regions in the genomes; median %URs, median proportion of URs in the genomes. 
Significance, Wilcoxon rank-sum test significance: ***p< 0.001, n.s., nonsignificant. 
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GOMO (Gene Ontology for Motifs; http://meme.nbcr.net/ 
meme/cgi-bin/gomo.cgi, last accessed August 2, 2013) 
(Buske et al. 2010), which assigned them a list of GO terms. 

AT-Skew Analysis 

To find indications on the location of the H-strand and 
L-strand origin of replication (0 H and 0 L , respectively) in R. 
philippinarum mt genomes, we calculated the AT-skew 
values on 4-fold redundant sites of protein-coding genes, 
using the formula (A + T)/(A - T). See Breton et al. (2009) 
for a detailed discussion. To support the findings, the analysis 
was extended to eight other Veneridae mt genomes (supple- 
mentary table S2, Supplementary Material online). 

Secondary Structures 

The mfold web server (http://mfold.rna. albany.edu/?q=mfold/ 
download-mfold, last accessed August 2, 2013) (Zuker 2003) 
was used for DNA secondary structure prediction. The analysis 
was performed with default settings except for folding tem- 
perature: we used the value of 1 5 ° C, which is the mean water 
temperature in the Venice lagoon during the reproductive 
season. Only the structures with the lowest AG value and 
showing conservation among the analyzed samples were 
selected. 

The RNAz web server (http://rna.tbi.univie.ac.at/cgi-bin/ 
RNAz.cgi, last accessed August 2, 2013) (Washietl et al. 
2005) was used for RNA secondary structure prediction. A 
window size of 200 bp and a window step-size of 100 bp 
were used. According to the software manual, alignments 
with P>0.5 are classified as functional, and a negative z 
score indicates a stable structure. To avoid misinterpretation, 
we used a strict cutoff and excluded all the structures with a 
P>0.95 and a z score < -4. Structure names legend: DS, 
DNA structure; RS, RNA structure; m, M-type; f, F-type. 

Transcriptome Analysis 

A cDNA library from 6 male and 6 female gonads was pro- 
duced following the protocol of Mortazavi et al. (2008). The 
library was sequenced on an lllumina GAIIx platform with 71- 
bp paired-end reads. The samples were barcoded, pooled, 
and sequenced on two lanes (two technical replicates). For 
detailed information about sampling, library preparation, 
and sequencing see Ghiselli et al. (2012). Reads were 
mapped to the R. philipinarum complete mitochondrial 
genomes (GenBank Accession Nos. AB065374-5) and 
allowed up to six mismatches per end. 

SNPs 

We used the Genome Analysis Toolkit (GATK, McKenna et al. 
2010) for base quality score recalibration, indel realignment, 
duplicate removal, and performed SNP and INDEL discovery 
and genotyping using standard hard filtering parameters or 



variant quality score recalibration (DePristo et al. 201 1). SNP 
effects were analyzed using the snpEff software (Cingolani 
etal. 2012). 

Statistical Analysis 

All data were checked for homoscedasticity, and variance sta- 
bilizing transformations were applied, where necessary, before 
tests. To show statistical dispersion, nontransformed data were 
used in boxplots. As in most cases data were not normally 
distributed, for uniformity we always applied nonparametric 
tests and, where not specified, P values are referred to the 
Wilcoxon rank-sum test. Statistical analysis and graphs were 
produced using R. Post hoc multiple comparison tests after 
Kruskal-Wallis nonparametric analysis of variance (ANOVA) 
were performed with the kruskalmc function (Siegel and 
Castellan 1988) implemented in the pgirmess R package. 

Results 

Proportion of URs in Mitochondrial Genomes of 
Metazoans 

The analysis of 2,656 complete mitochondrial genomes 
present in the MitoZoa Database allowed us to assess the 
proportion of URs in several groups of metazoans (table 1). 
gastropods and bivalves have a proportion of URs which is 
significantly different from all other groups (P< 0.001): gas- 
tropods have the most compact mitochondrial genome, 
whereas bivalves show the highest median percentage of URs. 

Structure of R. philippinarum Large URs 

Figure 1 resumes the main features of the major URs in M- and 
F-type mtDNAs. Figure 1a shows conserved regions, motifs, 
repeated units, and major secondary structures (both at DNA 
and RNA level). Figure ^b shows transcription depth, nucleo- 
tide variability (inter-lineage p-distance), and the ribbons link 
conserved region and motifs within and between major URs. 
Length of conserved blocks, repeated units, and motifs are 
included in supplementary tables S3-S6, Supplementary 
Material online. 

Ruditapes philippinarum F and M mitochondrial genomes 
contain two major URs. In the M genome, the MLUR is located 
between nd4L and tRNA-lle and it is preceded by the second 
largest UR (between nd2 and nd4L), MUR21 . In the F genome, 
the FLUR is located between nd2 and nd4L and it is followed 
by the second largest UR (FUR21; between nd4L and tRNA- 
lle). Overall, the two major URs (MUR21 and MLUR in M, FLUR 
and FUR21 in F) represent about 90% of the total amount of 
intergenic DNA in R. philippinarum mtDNAs. The obtained 
sequences are available in GenBank (FLURs: accession nos. 
KC243324-31; FUR21s: accession nos. KC243332-9; 
MLURs: accession nos. KC243340-6; MUR21s: accession 
nos. KC243347-53). The largest of these four URs is MLUR 
(3,588-3,610 bp), while the shortest is MUR21 (959 bp). FLUR 
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Fig. 1. — Features of Ruditapes philippinarum major URs. Main features of the largest URs in M- and F-type mtDNAs. (a) Conserved regions, motifs, 
repeated units and major secondary structures (both at DNA and RNA level). M, M-type mtDNA; F, F-type mtDNA; orange, motif a; turquoise, novel ORFs; 
dark green, subunit A; orchid, subunit B; red, subunit C; yellow, motif 5; light green, motif y; black, motif e; blue, motif P; G, G-homopolymer; Ra, Rb, Rc, R', 

(continued) 
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length is highly variable (from 2,185 to -2,800 bp) due to a 
different number of repeated units (fig. 1a). FUR21 ranges 
from 1,767 to 1,771 bp. 

Both MUR21 and FLUR contain, just upstream nd4L, 
a novel conserved lineage-specific ORF to which we 
will refer, from now on, as MORF and FORF, respectively 
(see supplementary figs. S1-S4 [Supplementary Material 
online] for nucleotidic and amino acid sequence alignments). 
MORF sequence is 51 9 bp long (1 72 aa), while FORF is 408 bp 
(135 aa). These sequences did not show any obvious homol- 
ogy with known proteins. To better understand origin and 
function of novel mitochondrial ORFs in DUI bivalves, an 
in-depth comparative analysis using multiple in silico 
approaches was performed in Milani et al. (2013). 

Conserved Functional Motifs and Identification of Origins 
of Replication 

We used the MEME suite and AT-skew analysis to 
identify molecular signatures of the origins of replication. 
Interestingly, two motifs, 5 and y (supplementary fig. S5a 
and b, Supplementary Material online), showed sequence sim- 
ilarity with motifs Sp1, Sp2, and Sp3 of the sea urchin 
Strongylocentrotus purpuratus mitochondrial CR (Jacobs 
et al. 1989; Cao et al. 2004). These motifs were found to 
be conserved also in the mitochondrial LURs of nine veneroid 
species (supplementary table S2, Supplementary Material 
online). Motif 5 (40 bp; supplementary table S7 and fig. S5a, 
Supplementary Material online) corresponds to sea urchin Sp2 
and the first part of Sp3 (P value = 7.32E- 16), while motif y 
(41 bp; supplementary table S8 and fig. S5b, Supplementary 
Material online) corresponds to a reversed segment of Sp1 
(P value = 3.45E- 10). A search with GOMO assigned to 
these two motifs a series of GO terms, many of which are 
related to transcription and DNA binding (supplementary table 
S9, Supplementary Material online). MEME also identified two 
motifs, p and s (supplementary fig. S5cand d, Supplementary 
Material online), that are specific of R. philippinarum. p is 
present in both M- and F-type mtDNAs, while s is M-type 
specific (fig. 1a). All the performed analyses failed to identify 
similarities with known motifs, therefore we are unable to 
assign a putative function to motifs p and s. 

AT-skew values, calculated in nine Veneridae species, are 
shown in supplementary table S10, Supplementary Material 
online. In R. philippinarum, F genome AT-skew values do not 
show any significant similarity with those of the other ge- 
nomes, so comparisons cannot be made. As a general pattern, 



the genes with the highest values are those nearest to the LUR 
while the lowest-scoring genes are associated to the same 
three tRNAs, that is, tRNA-His, tRNA-Glu, and tRNA-Ser. 
Paphia undulata, P. textile, and Meretrix lamarckii mt ge- 
nomes differ from this general scheme in only one of the 
aspects, whereas R. philippinarum F genome in both. 

Secondary Structures 

Supplementary table S11, Supplementary Material online, 
summarizes the principal features of DNA and RNA secondary 
structures, while supplementary table S12, Supplementary 
Material online, shows the detailed results of RNAz analysis. 
Four major DNA structures were identified, two in M-type and 
two in F-type (fig. 1 ). DS1 m and DS2m (supplementary figs. S6 
and S7, Supplementary Material online) in the MLUR, DS1f 
and DS2f (supplementary figs. S8 and S9, Supplementary 
Material online) in the FUR21. The most interesting features 
are 1) the terminal "b" loop of DS1m shows a TT/AA poly- 
morphism; 2) the terminal "f" loop of DS1m shows a TGT/ 
ACA polymorphism; 3) the "m" loop of DS2m and the "i" 
loop of DS2f have the same sequence (CGGTTTCAGAAG); 
and 4) the "\" loop of DS2m and the "h" loop of DS2f 
share the first 4 and the last 3 bases ( TAA GTAAAA CG in 
the male, and TAAGGTYACG in the female). 

The analysis with RNAz identified 6 structures in the MLUR 
and 5 in the FUR21 (fig. 1a). Among them, three structures 
(RS1, RS2, and RS3, supplementary figs. S10-S12, Supple- 
mentary Material online) are conserved between M-type 
and F-type, three (RS4m, RS5m, and RSm6; supplementary 
figs. S13-S15, Supplementary Material online) are M-type 
specific and two (RS4f and RS5f; supplementary fig. S16 
and S17, Supplementary Material online) are F-type specific. 

Transcription of Mitochondrial Genomes 

Overall, of the 90,233,244 sequenced reads, 9,895,466 
(9.12%) mapped to mtDNA. Transcription mapping to the 
mitochondrial genomes is shown in the Circos diagram of 
figure 2, while supplementary figure S18, Supplementary 
Material online, shows the amount of mitochondrial reads: 
there is no significant difference in total amount of reads be- 
tween males and females. The distribution of M-type and 
F-type transcripts is also shown in supplementary figure S18, 
Supplementary Material online: on average, 90.11% of the 
transcripts in male gonads are M-type. We found small traces 
(0.36%) of M-type transcripts in female gonads. 



Fig. 1. — Continued 

R", R'" ', M-type-specific repeats; R1-R1 1, Repeats; V, variable length spacer. (b) Circos diagram of the LURs of M- and F-type mtDNA showing transcription 
depth (orange gradient) and nucleotidic variability (inter-lineage p-distance, gray gradient) of the largest URs. The ribbons link conserved region and motifs 
within and between major URs. Note. — M-type above, F-type below. From the outside to the inside: transcription level (orange gradient scale 0-9000), 
p-distance (gray gradient scale 0-0.58), subunits and motifs with links between M-type and F-type. Orange, Motif a; turquoise, novel ORFs; dark green, 
subunit A; orchid, subunit B; red, subunit C; yellow, motif 5; light green, motif y; black, motif e; blue, motif (3. 



1540 Genome Biol. Evol. 5(8): 1 535-1 554. doi:10.1093/gbe/evt1 12 Advance Access publication July 23, 2013 



Structure, Transcription, and Variability of Metazoan mtDNA 



GBE 




Fig. 2. — Circos diagrams of M-type and F-type transcriptomes. Transcription depth and SNPs mapped to the Ruditapes philippinarum mitochondrial 
genomes (GenBank accession nos.: AB065374 and AB065375). Genes are colored according to ETC complexes: green, complex I; brown, complex III; red, 
complex IV; orange, complex V. Ribosomal genes are colored in yellow, URs in gray, MORF in purple, and tRNAs in white. Histograms represent reads depth 
of F-type mtDNA (light red) and M-type mtDNA (light blue); black lines scale CM[log 10 ~ 1 ]. Dots represent SNP position and frequency in protein coding 
genes; black lines scale 0-1 . 



The analysis of mitochondrial Coding Sequences (CDSs) re- 
vealed significant transcriptional differences: boxplots in figure 
3a-c show the gene-by-gene transcription levels of M-type in 
males (black), F-type in females (white) and F-type in males 
(gray), while figure 3d compares the three transcription pro- 
files. Wilcoxon rank-sum test was used to assess differential 
transcription between M-type (solid line with squares) and 
F-type in females (dashed line with circles). Spearman rank cor- 
relation test and Kendall tau test were used to assess the cor- 
relation between transcription of M-type (M), F-type in males 
(Fm), and F-type in females (F) (table 2). It is worth noting that F- 
type follows the same transcription profile independently from 
the nuclear background (i.e., the profile is the same in males 
and females; p = 0.965, P< 0.001 ; x = 0.890, P< 0.001). 

Supplementary table S13, Supplementary Material online, 
shows the list of annotated nuclear-encoded ETC genes used 
in the analysis. The transcription of nuclear-encoded ETC 
genes is reported in figure 4a. No significant differences 
were found between males and females, except for genes 
of Complex III that show a slightly higher transcription in 
males (P<0.05). Conversely, transcription of mitochondrially 
encoded ETC genes is always significantly different between 
M- and F-type, with the former being more transcribed for 
Complexes I and V, the latter for Complexes III and IV (fig. 4b). 

The analysis of M-type mtDNA transcriptome showed that 
three mitochondrial coding genes (nd4, nd5, and nd4L) have a 



similar transcription level to MORF, and one (nd2) is less tran- 
scribed (supplementary table S14, Supplementary Material 
online; fig. 3a). On the contrary, FORF showed a very low 
transcription rate and its transcription level is significantly 
lower than all the F-mtDNA CDSs (supplementary table S15, 
Supplementary Material online; fig. 3b). 

SNP Analysis 

Table 3 reports SNP quality and coverage. In all the three mi- 
tochondrial genomes (F, Fm, and M) more than 93% of the 
SNPs exceeds a Phred score of 50. SNPs with Phred scores 
below 30 were not called. The coverage is high: only 8 SNPs 
(1 .4% of the total) in the Fm genome and 2 SNPs (0.0048%) 
in the M genome have a depth less than 25. On the other side, 
the vast majority of the SNPs have a coverage less than 1 0Ox 
(97%, 92.9%, and 98.5% for F, Fm, and M genomes). 
Supplementary figure S19, Supplementary Material online, 
shows the scatter plot of the coverage against the number 
of SNPs (normalized to gene length). Spearman rank correla- 
tion test and Kendall tau test are not significant (x = -0.18, 
p=ns; p = -0.26, P=ns), supporting the absence of corre- 
lation between number of reads and number of called SNPs. 

The kernel density plot of allele frequencies (fig. 5) evi- 
dences a different distribution between F and M mitochondrial 
genomes (Kolmogorov-Smirnov P= 0.0061 ): the F-type shows 
an excess of rare alleles (frequency < 0.1 25), while M-type has 
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Fig. 3. — Transcription level of mitochondrial protein coding genes, (a) Transcription of M-type in males; (b) transcription of F-type in females; (c) transcrip- 
tion of F-type in males; and (of) transcription profiles (median values used). On they axis is plotted the FPKM value. The lines that links the genes in (of) are 
virtual. Their purpose is to highlight the differences and similarities of transcription profiles. See table 3 for the correlation tests between mtDNA transcripts. In 
(d), the significance of Wilcoxon rank-sum test between M-type (M) and F-type in females (F) is reported below the x axis. *P<0.05, **P<0.01, 
***p< 0.001, ns, nonsignificant; na, not applicable. 



Table 2 

Mitochondrial Transcription Correlation Tests 



Test 


Genomes 


Significance 


Notes 


Spearman 


M vs. F 


* 


p = 0.600 


Spearman 


M vs. Fm 


* 


p = 0.600 


Spearman 


F vs. Fm 


*** 


p = 0.965 


Kendall 


M vs. F 


* 


t = 0.451 


Kendall 


M vs. Fm 


* 


t = 0.429 


Kendall 


F vs. Fm 


*** 


t = 0.890 



Note— M, M-type mtDNA; F, F-type mtDNA; Fm, F-type mtDNA in males; p, 
Spearman's rank correlation coefficient; x, Kendall tau rank correlation coefficient. 
*P<0.05. 
***P< 0.001. 



a pronounced peak around 0.5. The distribution in the Fm 
genome (not shown) is not statistically different from that in F. 

Table 4 summarizes the SNP analysis. M-type has signifi- 
cantly less SNPs (P< 0.001) in comparison with both the 
F-types (F and Fm), which, conversely, do not differ between 
them (table 4 and fig. 6a). We subdivided the SNPs according 
to whether they are present with a single allele or multiple 
alleles in an individual. We called the former type "monoallelic 
SNP" and the latter "polyallelic SNP". Polyallelic SNPs have 



always the reference allele among their variants. Compared 
with polyallelic SNPs, monoallelic SNPs have a lower propor- 
tion of nonsynonymous substitutions (Ns/Tot, table 4) in all the 
genomes (P= 1 .904E-7). Boxplots in figure 6b and c show 
the proportion of nonsynonymous changes in polyallelic 
(fig. 6b) and monoallelic (fig. 6c) SNPs. The SNPs were sub- 
divided in three classes according to their effect on genes 
(high, moderate, and low): boxplots in figure 6d-f show the 
proportion of the total amount of SNPs pertaining to each 
class, whereas in figure 6g-i only the monoallelic SNPs are 
considered. 

Discussion 

Bivalve mtDNAs Contain a High Proportion of URs 

Bivalvian mtDNAs have, on average, 1 .7x the amount of URs 
in respect to analyzed Metazoa (1 1 .2% vs. 6.6%, P< 0.001 ; 
table 1). How does noncoding DNA accumulate in mitochon- 
drial genomes? The principal mechanisms affecting mitochon- 
drial genome structural evolution are 1) slipped-strand 
mispairing, 2) errors in termination of replication, 3) recombi- 
nation, and according to the duplication-random loss model, 
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ns * ns ns *** *** *** ** 

Fig. 4— Transcription level of electron transport chain (ETC) genes, (a) Nuclear-encoded genes: black, male gonad; white, female gonad, 
(b) Mitocondrially encoded genes: black, M-type mtDNA; white, F-type mtDNA. I, III, IV, and V represents the ETC complexes: the analyzed genes and 
their accession numbers are enlisted in supplementary table S10-S13, Supplementary Material online. Complex II proteins are encoded only by nuclear 
genes, so they were not included in the analysis. On the y axis is plotted the FPKM value. Wilcoxon rank-sum test significance: *P<0.05, **P<0.01; 
***p< 0.001. 



Table 3 

SNP Quality and Coverage 

Genome Phred Score 



Min Max Mean Median 30-40 40-50 >50 

F 30 122,730 9,259.64 804.02 3.7% 3.1% 93.2% 

Fm 30.23 126,287 12,170.75 623.44 3% 2.9% 94.1% 

M 30.23 126,287 17,005.83 861.77 2.6% 3.6% 93.8% 



Genome Depth 



Min Max Mean Median <25 25-100 100-1,000 >1,000 

F 25 2,997 1,628.723 1,772 0% 3% 34.4% 62.6% 

Fm 3 3,000 1,540 1,574 1.4% 5.7% 36% 56.9% 

M 20 3,000 1,866 2,150 4.8E-3% 9.6E-3% 27.4% 71.1% 



noncoding regions may arise from random pseudogenization 
of duplicated gene copies (Boore 2000). 

The already mentioned high variability of gene order and 
the presence of duplicated genes (Ren et al. 201 0; Passamonti 
et al. 2011; Okazaki M and Ueshima R, unpublished data) 
support the common occurrence of gene rearrangements in 
bivalve mitochondrial genomes. In particular, in bivalve species 
with DUI, mtDNA recombination is easily detectable, given the 
sequence divergence between M and F genomes (Ladoukakis 
et al. 201 1 and references therein), and extensive rearrange- 
ments and duplications of parts of the CR have been well 
documented in Mytilus (Burzynski et al. 2003, 2006; Breton 



et al. 2006; Venetis et al. 2007; Cao et al. 2009). Recently, 
Ladoukakis et al. (2011) reported mitochondrial recombina- 
tion between sequences with more than 20% divergence in 
the DUI species Mytilus galloprovincialis, showing that recom- 
bination is not restricted to sequences with low divergence. As 
explanation, the authors hypothesized a relaxation of the mis- 
match repairing system in animal mitochondria, but then, why 
are not mtDNA rearrangements more common in meta- 
zoans? Gissi et al. (2010) found hypervariability in ascidian 
mtDNA gene order, comparable only with that observed in 
molluscs. The only conserved feature among mitochondrial 
genomes of Tunicata is that all the genes are coded on the 
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0.0 0.5 1.0 

Allele Frequency 



Fig. 5. — Kernel density plot of allele frequencies in mitochondrial 
CDSs. Probability density function of allele frequencies calculated by 
kernel density estimation. Solid line: M-type mtDNA; dashed line: F-type 
mtDNA in female gonads. The two distributions are significantly different 
(Kolmogorov-Smirnov P= 0.0061): the F-type shows an excess of rare 
alleles (frequency < 0.125), while M-type has a pronounced peak 
around 0.5. The distribution in the Fm genome (not shown) is not statis- 
tically different from that in F. 



same strand, a feature that they share with all marine bivalves. 
Ren et al. (201 0) suggested that coding on both strands could 
be a factor inhibiting recombination. Interestingly, among bi- 
valves, freshwater mussels (family Unionidae) have dual-strand 
coding and show few mtDNA rearrangements with a propor- 
tion of URs that is much lower compared with the other spe- 
cies of the class (median in unionids = 7.9%, N= 18; median 
in other bivalves= 13%, A/ = 46; P< 0.001). 

According to the Mutation Pressure theory, fast evolving 
organelle genomes are more exposed to a selective pressure 
for genome reduction. Bivalve mtDNAs seem to contradict this 
theory, because their hypervariability is coupled with a high 
percentage of intergenic DNA. But are bivalvian URs really 
nonfunctional? What if their retention in the genome is 
caused by the presence of functional sequences and/or 
structures? 

Lineage-Specific Novel ORFs 

Lineage-specific novel ORFs in DUI mtDNAs were already 
found in Mytilidae and Unionidae (Breton et al. 2009, 2010, 
201 1a, 201 1b) and this is the first evidence from the family 
Veneridae. In the unionid, Venustaconcha ellipsiformis the 
translation of both FORF and MORF was demonstrated by 
Western blot (Breton et al. 2009), and the FORF protein was 
localized by immuno electron microscopy in both mitochon- 
dria and nucleus of the eggs (Breton et al. 2011b). A func- 
tional role of the lineage-specific mitochondrial ORFs 
identified in DUI bivalves was hypothesized: specifically, 
Breton et al. (201 1 b) proposed a role in germ line determina- 
tion and maintenance of gonochorism. Given the tight asso- 
ciation between the presence of M-type mtDNA and 
maleness, a role of DUI in sex differentiation was proposed 



(reviewed in Passamonti and Ghiselli 2009; Zouros 2012), but 
whether this coupling is causative or associative is still matter 
of debate (Zouros 2012). It is worth noting that the influence 
of a mitochondrial ORF on germ line development is well doc- 
umented in plants (Cytoplasmic Male Sterility, CMS; Chase 
2007). 

One might ask why do MORFs and FORFs need to be 
retained in the mtDNA and do not migrate in the nucleus. If 
these ORFs have a lineage-specific role (i.e., they are function- 
ally linked to M- or F-type, and/or they represent some sort of 
tag) their migration to the nuclear genome would likely affect 
their function, especially considering that bivalves do not have 
sex chromosomes, or at least they are not morphologically 
distinguishable, thus they do recombine. Another possibility 
is that a nuclear copy of the ORFs exists and the mitochondrial 
copy will be lost as a result of selection (see Allen 2003, §4g, 
point [iii]), even if our analyses do not indicate an accumula- 
tion of mutations in the ORFs. 

Conserved Motifs and Origin of Replication 

Figure 1b highlights the connections between subunits and 
motifs between the major URs of M- and F-type. We com- 
pared M and F mtDNAs to identify similarities and differences: 
similarities are supposed to be linked to a common physiolog- 
ical function (i.e., control of replication and transcription), 
whereas differences could be involved in the different "behav- 
ior" of the two mitochondrial lineages. Sequence alignments 
identified three conserved regions, subunit A, subunit B, and 
subunit C (fig. 1). From a functional point of view, subunits A 
and B and their neighboring regions seem to be the most 
interesting. Inside and right after subunit B are present two 
motifs (5 and y, respectively), which show a strong conserva- 
tion among the family Veneridae and, most importantly, with 
the sea urchin 5. purpuratus (E value 5.5E-81 for 5 and 
9.4E-54 for y; see Results and supplementary tables S7 and 
S8, Supplementary Material online, for details) whose CR has 
been characterized (Jacobs et al. 1989). Motifs 5 and y match 
elements of the sea urchin CR, which are homologous to the 
mammalian Conserved Sequence Blocks (CSBs) (Cantatore 
et al. 1 989, 1 990; Jacobs et al. 1 989). CSBs have a fundamen- 
tal role in the initiation of mtDNA replication, particularly in the 
formation of the R-loop, an RNA primer that is necessary for 
the formation of the D-loop and the start of H-strand synthesis 
(Scheffler 2008). The GOMO tool assigned GO terms related 
to transcription and DNA binding to both motifs 5 and y, 
further supporting their involvement in replication and tran- 
scription initiation, which are intimately linked in mitochondria 
(Scheffler 2008). Moreover, Cao et al. (2004) reported a 
match between some motifs found in the CR of the marine 
mussels M. edulis and M. galloprovincialis with the above- 
mentioned elements of the sea urchin CR. All that considered, 
we can deduce that subunit B is close to 0 H and that MLUR 
and FUR21 are the CRs of the M and F mitochondrial 
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genomes, respectively. To further support this hypothesis, we 
performed a comparative AT-skew analysis on complete mt 
genomes of R. philippinarum and other eight Veneridae (sup- 
plementary table S10, Supplementary Material online). The 
distribution of the values points to a location of the 0 H 
inside the LUR, corroborating the hypothesis that this region 
is/contains the CR. The analysis also gave us clues about the 
localization of 0 L that seems to be strictly associated to the 
presence of a conserved tRNA cluster composed by tRNA-His, 
tRNA-Glu, and tRNA-Ser. 0 L is thought to be associated to 
secondary structures: given our findings, this tRNA cluster 
could provide such signal and a similar situation, where a 
three tRNA cluster may function as 0 L , is also found in unionid 
mtDNAs (Breton et al. 2009). Interestingly, R. philippinarum F 
mtDNA did not show a pattern comparable with other ge- 
nomes. Possible explanations for this incongruence may be: 1) 
variable locations of the origins of replication in these species, 
as observed by Breton et al. (2009) in unionid bivalves; 2) 
recent mtDNA rearrangements (Ovchinnikov and Masta 
201 2). At the present time, we do not have enough informa- 
tion to choose between these two options. 

Secondary Structures 

CRs are typically rich in secondary structures (Breton et al. 
2009 and references therein) both at DNA and RNA level. 
There is clear evidence that secondary structures play a crucial 
role in biological processes such as DNA replication, transcrip- 
tion, recombination, repair, cleavage, control of gene expres- 
sion, and genome organization (Pereira et al. 2008; Brazda 
et al. 2011). For instance, hairpins and cruciform structures 
can function as recognition sites for transcription factors, and 
their presence has been proved or inferred in many mitochon- 
drial CRs (Cao et al. 2004; Mizi et al. 2005; Arunkumar and 
Nagaraju 2006; Pereira et al. 2008; Passamonti et al. 2011). 
When getting direct molecular evidence is not possible, sec- 
ondary structures can be predicted in silico, and there are 
basically two methods to do it: free energy minimization 
and alignments of homologous sequences. In the first case, 
structures are inferred by calculating the variation of Gibbs 
free energy (AG) due to the folding of the nucleic acid 
(Zuker 2000). The structure with the lowest free energy is 
the most thermodynamically stable, therefore it is supposed 
to be the most common state of the molecule. The downsides 
are that nucleic acid sequences have more than one biologi- 
cally active structure, and that the thermodynamic minimum is 
not always the actual conformational status of the sequence 
in vivo. The second method can provide information about 
evolutionary conservation improving the structure prediction 
accuracy (Xu and Mathews 201 1). We utilized a combination 
of both approaches and found structures conserved between 
M mtDNAs, between F mtDNAs and shared by the two 
lineages (supplementary table S11 and figs. S6-S17, 
Supplementary Material online). The most significant 
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Fig. 6. — Boxplots of SNP polymorphism and SNP effects in F (white), Fm (gray), and M (black) mitochondrial genomes, (a) number of SNPs normalized to 
coding sequence (CDS) length; (b) nonsynonymous (Ns) SNPs to total number of SNPs ratio (polyallelic SNPs only); (c) nonsynonymous (Ns) SNPs to total 
number of SNPs ratio (monoallelic SNPs only); (of) percentage of high-effect SNPs (polyallelic + monoallelic); (e) percentage of moderate-effect SNPs 
(polyallelic + monoallelic); (f) percentage of low-effect SNPs (polyallelic + monoallelic); (g) percentage of high-effect SNPs (monoallelic only); (h) percentage 
of moderate-effect SNPs (monoallelic only); (/) percentage of low-effect SNPs (monoallelic only). Note. — A Kruskal-Wallis nonparametric ANOVA was 
performed. Significance levels of post hoc multiple comparison tests are reported below the x axis of each plot. *P<0.05, **P<0.01, ***P< 0.001, 
ns, nonsignificant. 



structures were predicted in, or close to, the conserved regions 
of the CRs, subunits A and B. The low intra- and interlineage 
variability strongly support a functional role of such sequences, 
and can also be explained by a modulation of mutation rates 
by secondary structures: paired bases in double-stranded stem 
regions are less prone to mutations (Hoede et al. 2006). We 
identified two major DNA structures per lineage: DS1m and 
DS2m in M-type, DS1f and DS2f in F-type (fig. 1). The struc- 
tures show lineage-specific differences with regard to shape 
and number of substructures (stem-loops and stacks). The 
M-type specific structure shows an interesting polymorphism 



in two loops (TT/AA and TGT/ACA; supplementary table S1 1 
and fig. S6, Supplementary Material online), whose function, 
if any, is unknown. The most notable feature of DS2m/f is the 
presence of an invariant sequence in the loop of a substruc- 
ture (DS2m-m and DS2f-i), 26-30 bp upstream motif y. Loop 
sequences are more vulnerable to mutations, and a 100% 
conservation among all sequenced M and F mtDNAs hardly 
can be labeled as coincidental. Moreover, the loop of another 
substructure (DS2m-l and DS2f-h) shows an inter-lineage se- 
quence conservation, although partial: the central part of the 
loop is indeed different (TAAA in M-type and GTY in F-type). 
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As far as RNA is concerned, we found three structures (RS1 , 
RS2, RS3; fig. 1) shared by the two mitochondrial genomes. 
RNAz alignments show a high inter-lineage conservation and 
multiple compensatory base changes in the stem regions (sup- 
plementary figs. S10-S17, Supplementary Material online), 
which suggest the functionality of the structures. The three 
structures are localized on the reverse strand; this is interesting 
especially in the case of RS3, which is formed in the same 
region as DS2m/f but on the opposite strand. Although 
being on the complementary strand, RS3 does not have the 
same folding as the corresponding DNA structure (DS2m/f), 
but notably the substructures m, i, I, and h are present also in 
RNA, forming a complementary copy. This is another clue 
pointing to some biological function for these substructures 
and for the conserved sequence that they carry in their loops. 
Our analysis also identified 5 lineage-specific RNA secondary 
structures: RS4m, RS5m, and RS6m in the M-type, RS4f, and 
RS5f in the F-type. RS4m and RS5m are very similar between 
each other because are formed in a region with repeated 
sequences (Ra, Rb, Rc, fig. 1). RS6m occupies the same posi- 
tion as DS1m, it forms on the opposite strand, and shares 
substructures e and f with the correspondent DNA structure. 
In the F mtDNA, upstream subunit A, only one RNA structure 
is present (RS4f). Finally, RS5f is in the same position of DS1f, it 
forms on the same strand and is quite similar to its DNA 
counterpart. 

The presence of secondary structures showing inter-lineage 
conservation and forming in proximity of motifs that have a 
role in transcriptional/replicational control (5 and y) suggests 
that they are probably involved in the same process. 

Mitochondrial Transcription 

Slightly more than 9% of the total number of reads mapped 
to mitochondrial DNA and no significant difference between 
males and females was detected (supplementary fig. S18, 
Supplementary Material online), meaning that the amount 
of mitochondrial transcripts in male and female gonads is ap- 
proximately the same. Mitochondrial transcripts have different 
sources in males and females: males are heteroplasmic so their 
transcripts come from both M and F mtDNAs, while the only 
source of mitochondrial transcripts in females is the F-type. 
More specifically, on average, 90.11% of the transcripts in 
male gonads are from M mtDNA, while the remaining 
9.89% are F-type transcripts (supplementary fig. S18, 
Supplementary Material online). This result is expected given 
that, in this species, M-type is always strongly predominant in 
male gonads (Ghiselli et al. 201 1), thus the main reason for 
the difference in transcription is probably the different mtDNA 
copy number. Our analyses showed small traces of M-type 
transcripts in female gonads (0.36%) which can be explained 
in two ways: by a small amount of cross-contamination be- 
tween samples, and/or by the actual presence of M mtDNA in 
female gonads, which can occur sometimes (Ghiselli et al. 



201 1). Given the exiguous amount (and thus the nonsignifi- 
cant statistical weight), these reads were treated as contami- 
nation and excluded from the analyses. Thus, three types of 
mitochondrial genomes (and their transcripts) were consid- 
ered: 1) M-type, which is localized in male gonads and that 
can be inherited by male progeny through sperm; 2) Fm-type, 
which is the F-type present in male gonads and that is an 
evolutionary dead-end because is not transmitted to progeny 
(Ghiselli et al. 2011); 3) F-type, which is localized in female 
gonads and that can be inherited by both male and female 
progeny through eggs. 

mtDNA is transcribed as a polycistronic primary transcript 
which is edited to form mRNAs, but this does not mean that 
mitochondrial genes have always the same relative expression 
level, since differential expression is achieved by post-tran- 
scriptional control (Lynch 2007; Scheffler 2008). We gener- 
ated the RNA-Seq library selecting polyadenylated transcripts, 
so our analysis only includes transcripts that underwent an 
editing phase. 

Autonomous Regulation of Mitochondrial Expression 

Figure 3 shows the transcriptional differences among mito- 
chondrial genes in M-type (black, fig. 3a), F-type (white, 
fig. 3b), and Fm-type (gray, fig. 3c). In figure 3d, the three 
transcriptional profiles are compared: with the exception of 
cox2 and rrnL, the transcription is always significantly different 
between M and F (solid line and dashed line, respectively, see 
P values below x axis). Fm transcription (dotted line) is obvi- 
ously significantly lower in respect to both M and F, but shows 
an interesting feature: its transcriptional profile is almost iden- 
tical to that of F (Spearman's correlation coefficient p = 0.965, 
P< 0.001; Kendall's correlation coefficient x = 0.890, 
P< 0.001; table 2). Except for a small difference in Complex 
III, the transcription level of nuclear-encoded ETC genes does 
not change between male and female gonads (fig. 4a), 
whereas the mitochondrially encoded ETC genes have 
always a significantly different transcription (fig. 4b). Taken 
together, these observations are consistent with the hypoth- 
esis of CO-location for Redox Regulation (CORR; Allen 2003). 
The aim of Allen's hypothesis is to explain the retention of 
genes in cytoplasmic organelles: it states that mitochondria 
and chloroplasts retained genes whose expression need to 
be under direct regulation of the redox state of their products 
or of electron carriers with which their products interact. This 
permits "direct and autonomous redox regulation of gene 
expression" (Allen 2003). The fact that M and Fm show dif- 
ferent transcription profiles under the same nuclear environ- 
ment (male gonad), is consistent with a regulation operated 
by mitochondrial components. Moreover, our data about 
transcription of nuclear-encoded ETC genes (fig. 4a) match 
a prediction of the CORR hypothesis: the nucleus would 
provide a fairly constant pool of transcripts producing mito- 
chondrial precursor proteins, ready to be imported in the 
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mitochondrion following the "decision" of the organelle 
genome (Lane 2007). 

Lineage-Specific Transcription and M-Type Bioenergetic 
Activity 

To explain the observed transcriptional differences between 
M- and F-type mtDNAs, we propose two hypotheses. 1) 
According to several Authors (Zouros 2012) M genome 
could be a selfish or "nearly selfish" element that found a 
way to be inherited through sperm. Under this light, the tran- 
scription profiles shown in figure 3d could support this: the 
"regular" transcription in R. philippinarum gonad would be 
that showed by F and Fm, while M would be less coordinated 
with nuclear factors, therefore showing a different transcrip- 
tion pattern. 2) According to the mitochondrial theory of 
ageing (Allen 1996) there is a division of labor between 
female and male germ line mitochondria. The former have a 
repressed bioenergetic function to prevent mutagenesis 
caused by ROS production thus facing only mutations due 
to replication errors. On the other side, male germ line mito- 
chondria are bioenergetically active (their energy is needed for 
spermatozoa movement), thus more prone to mutagenesis by 
ROS. Therefore, in gametes there is a tradeoff between 
motility and fidelity of mtDNA transmission, implying that mi- 
tochondria that become bioenergetically functional are genet- 
ically disabled (Allen 1996). Recently, de Paula et al. (2013) 
found evidence supporting the hypothesis that oocyte mito- 
chondria are quiescent in the jellyfish Aurelia aurita and dis- 
cussed the Weismann barrier in germ line mitochondria. The 
mitochondrial theory of ageing and de Paula et al. (2013) 
results support the continuity of mitochondrial germ plasm 
(i.e., that acquired mitochondrial mutation is not inherited; 
see de Paula et al. 201 3, fig. 7e). It is clear that DUI represents 
an interesting system to test the mitochondrial theory of 
ageing, as it seems that M mtDNA is breaking the rule of 
mitochondrial germ line continuity. Our results show a signif- 
icantly different transcription pattern between M and 
F mtDNAs (figs. 3d and 4b), but they cannot support the 
quiescence of oocyte mitochondria. Indeed (fig. 3d), even if 
seven protein coding genes showed a higher transcription in 
M {atp6, nd3, nd5, nd6, nd1, nd2, and nd4L), four showed a 
higher transcription in F (cytb, nd4, cox3, and coxl) and one 
showed no significant difference (cox2). In contrast, de Paula 
et al. (201 3) found a marked difference in mitochondrial tran- 
scription between testis and ovary, even though the analysis 
was made on three genes {nd1, cytb, and coxl). This work 
cannot be conclusive about this subject, and further analyses 
(e.g., membrane potential, ROS content and transcription in 
somatic tissues) are needed to better assess the activity of the 
two types of mitochondria in R. philippinarum. 

In DUI organisms, M mitochondria are transmitted through 
sperm to male progeny, thus playing both the roles of energy- 
transducers and genetic templates. How can they escape the 



ROS-induced mutagenesis affecting bioenergetically active or- 
ganelles? Bivalve molluscs habitats (i.e., sediments and inter- 
tidal environments) are subject to recurring hypoxia or anoxia. 
Along with several marine invertebrates, M. edulis (a DUI spe- 
cies) has been found to have facultatively anaerobic mitochon- 
dria capable of malate dismutation, a metabolic pathway 
(common to most parasitic helminths) that produce ATP 
through degradation of carbohydrates (reviewed in Muller 
et al. 2012). Such pathway of facultative anaerobic metabo- 
lism in M. edulis bypasses the ETC Complexes II, III and IV, thus 
reducing ROS production. Interestingly, our data show that, 
compared with F-type mtDNA, M-type transcription is lower 
for Complexes III and IV and higher for complex I and V 
(fig. 4b). We speculate that, to reduce ROS production in 
male germ line, M-type mitochondria in R. philippinarum 
might use malate dismutation as an alternative way to pro- 
duce ATP. We think that this working hypothesis deserves 
further investigation. 

MORF Is Transcribed 

Our data support the functionality of the MORF. Not only the 
sequence is conserved among all the analyzed males and does 
not show indels or stop codons, but it is also transcribed at a 
level which is comparable with that of the other typical 
mitochondrially encoded ETC genes (fig. 3a and d; supple- 
mentary table S14, Supplementary Material online). On the 
other hand, FORF shows a very low level of transcription, the 
lowest among F-type mtDNA genes (fig. 3b and d; supple- 
mentary table S1 5, Supplementary Material online); therefore, 
we are inclined to believe that it is not functional, or that it is 
transcribed in a different developmental stage. 

The cox2 Duplication 

The F genome contains a duplicaton of the cox2 gene, named 
cox2b (fig. 2), a feature that has been also observed in the 
M-type mtDNA of another DUI species, Musculista senhousia 
(Passamonti et al. 201 1 ). The two copies have different length: 
the shortest, cox2, is 1,569 bp long (523 aa), while the 
longest, cox2b, is 1,971 bp long (657 aa). They have also a 
markedly different transcription level (fig. 3b and d), so, for all 
these reasons, we think that cox2b is undergoing a pseudo- 
genization process, or that it is not functioning as a cyto- 
chrome oxidase subunit 2 anymore. In the M genome of 
the freshwater mussel V. ellipsiformis, the cox2 gene has a 
555 bp coding extension that has been hypothesized to 
have a reproductive function (Breton et al. 2007 and refer- 
ences therein). Whether R. philippinarum cox2b underwent a 
neofunctionalization process acquiring a similar function will 
be matter of future investigations. 

Amount of Polymorphism 

The fast-evolving nature of bivalve mtDNA is a well known 
feature, but the underlying mechanisms are not. In DUI 
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species, the M-type mtDNA always showed a higher amount 
of variation in respect to the F-type mtDNA (Zouros 201 2 and 
references therein) except in M. senhousia where the opposite 
pattern was observed, probably due to a historical effect of its 
introduction in the Adriatic Sea (Passamonti 2007). These 
observations led to the hypothesis of a faster evolution of 
the M-type mtDNA, confirmed by several studies in which 
comparisons of whole mitochondrial genomes were used 
(Mizi et al. 2005; Breton et al. 2006; Zbawicka et al. 2010; 
Doucet-Beaupre et al. 201 0). Here, for the first time, we used 
a high-throughput approach to assess the amount and the 
type of polymorphism in the gonadal mitochondrial popula- 
tions. Given that the vast majority of gonadal mtDNAs are 
localized in gametes, this analysis is useful to estimate, both 
quantitatively and qualitatively, the standing genetic variation 
of the mitochondrial population that is going to be transmit- 
ted to the progeny. The high coverage (table 3) allowed us to 
detect rare alleles, and the RNA-Seq protocol gave us the 
chance to avoid PCR-based methods: in a situation were 
DNA sequences are highly polymorphic, PCR primers fail to 
amplify mutated targets, leading to an underestimation of the 
actual variability (see Theologidis et al. 2008 for a detailed 
discussion). 

Figure 5 shows the distribution of allele frequencies in M 
and F. F-type mtDNAs show an U-shaped distribution, with a 
low proportion of intermediate-frequency alleles and a high 
proportion of rare alleles. The abundance of low fre- 
quency variants causes a shift of high frequency alleles to- 
wards a slightly lower frequency class. The distribution in 
M-type is significantly different (Kolmogorov-Smirnov test, 
P= 0.0061), with a lower proportion of rare alleles and a 
much higher proportion of mid-frequency alleles. The differ- 
ent amount of low-frequency alleles can be explained in term 
of bottleneck size. During its inheritance route, mitochondrial 
population is subject to a dramatic reduction followed by a 
massive expansion (see Ghiselli et al. 2011 and Milani et al. 
201 1 for discussions about mitochondrial bottleneck in DUI 
animals). After a population shrinkage, rare alleles are quickly 
eliminated while intermediate and high-frequency alleles are 
preserved (Maruyama and Fuerst 1984, 1985). Because of the 
higher number of mtDNAs in eggs compared with sperm 
(~10x in this species; see Ghiselli et al. 2011), F-type 
mtDNAs experience a wider bottleneck, therefore the larger 
population size is compatible with a higher amount of low 
frequency alleles. The above-mentioned rationale also explains 
the persistence of intermediate-frequency alleles in M despite 
the narrower bottleneck since intermediate-frequency alleles 
are less likely to be eliminated by drift and more likely to be 
fixed by selection (Olson-Manning et al. 2012). Although pop- 
ulation size effects can account for the loss of rare variants 
they cannot justify the difference in mid-frequency alleles be- 
tween M- and F-type, whose explanation might be found in a 
different action of natural selection. It is well known that mi- 
tochondrial genomes evolve mainly under purifying selection 



(Rand 2001; Meiklejohn et al. 2007; Galtier et al. 2009b), 
nonetheless, deviations from the negative selection regime 
have been reported in gynodioecious plants (Galtier et al. 
2009b and references therein). Gynodioecy is a form of 
sexual dimorphism in which females and hermaphrodites co- 
exist in the same population (Couvet et al. 1998); in this 
system, gender is determined by epistatic interactions be- 
tween mitochondrial and nuclear loci, a mechanism known 
as Cytoplasmic Male Sterility (CMS, see Chase 2007 for a 
review). In CMS mitochondrial ORFs produce chimeric proteins 
which cause pollen sterility, and this process can be counter- 
acted by one or more nuclear restorer-of-fertility genes. The 
ongoing conflict between CMS mitotypes and nuclear re- 
storers leads to long-term balancing selection, as observed 
in several CMS species (Gouyon et al. 1991; Couvet et al. 
1998; Houliston and Olson 2006). Even if at speculation 
level, the DUI system presents some intriguing resemblances 
with CMS, and the distribution pattern of allele frequency in 
M-type mtDNA is an additional similarity that deserves to be 
further investigated. 

Figure 6a reports the total number of SNPs: F and Fm show a 
higher number of SNPs (with no significant difference between 
them) in respect to M (P< 0.001). Taken together with the 
allele frequency data, this piece of information indicates a dif- 
ferent kind of polymorphism between egg-transmitted (F and 
Fm) and sperm-transmitted (M) mtDNAs. F and Fm show more 
variable sites and rare alleles, while M shows a lower number 
of variable sites but with a higher proportion of alleles with 
intermediate frequency. This means that F and Fm variability 
has been underestimated until now: a large part of the poly- 
morphism has been hidden, given the difficulties in amplifying 
and sequencing rare alleles with PCR-based methods. 

Type of Polymorphism 

There is a large number of mitochondria in every cell, and each 
mitochondrion has multiple copies of mtDNA. In such condi- 
tions, it is difficult to understand how much a deleterious 
mutation affects the biological function of an organelle (see 
Rand 2001 for a review on multi-level selection on mtDNA). 
The high ploidy of mtDNA in a cell implies that functional 
copies of the genes can buffer the malfunctioning or nonfunc- 
tioning copies, practically slowing down the action of natural 
selection on deleterious alleles. Selection acts on mitochondria 
through the autophagy process, which eliminates damaged 
and old organelles (mitophagy, see Youle and Van Der Bliek 
2012). If natural selection is partially blinded by the buffering 
effect of multiple copy numbers, we expect a high amount of 
nonsynonymous polymorphism to exist in mitochondrial pop- 
ulations. This is actually what we observed: the median ratio of 
nonsynonymous to total number of SNPs is 0.64 in M, 0.56 in 
Fm, and 0.70 in F (Ns/Tot*, table 4). An even more clear indi- 
cation of the buffering process comes from the comparison of 
nonsynonymous polymorphism between polyallelic and 
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monoallelic SNPs. We defined polyallelic those SNPs which are 
present with multiple alleles within an individual, and mono- 
allelic those which have a single allele. Monoallelic SNPs have 
always a lower proportion of nonsynonymous changes 
(P= 1.904E-7): the percentage drops from 75.2% to 32% 
(2.35x) in M, from 62.5% to 42.1 % (1 .48x) in Fm and from 
77.5% to 40.4% (1.91x) in F (Ns/Tot, table 4). Deleterious 
monoallelic SNPs cannot be buffered by alternative functional 
alleles, so the probability of their persistence in the population 
is lower. Reinforcing this concept, we observed that polyallelic 
SNPs had always the functional allele among their variants. 
Figure 6b and c show that the drop of nonsynonymous poly- 
morphism between polyallelic and monoallelic SNPs is differ- 
ent in the three mtDNAs. M and F have a higher amount of 
nonsynonymous polyallelic SNPs, in comparison with Fm 
(P<0.05 and P<0.01, respectively), but their nonsynon- 
ymous polymorphism is more strongly reduced in monoallelic 
SNPs. Interestingly, the reduction is higher in the M-type 
(2.35x, table 4 and figure 6b and c). 

To better understand the type of sequence variation in our 
mitochondrial populations, we analyzed the SNP effects, 
which were subdivided in three classes by the snpEff software 
(Cingolani et al. 2012). The high effect class includes nonsy- 
nonymous mutations that likely can provoke a loss of function 
(start lost, frameshift, nonsense, stop lost, and rare amino 
acid). Medium effect SNPs are also nonsynonymous substitu- 
tions, but not as disruptive as those in the previous class: they 
cause alterations that probably entail a lower functionality of 
the protein, but that can be tolerated (codon change, codon 
insertion, and codon deletion). In some cases, the functionality 
could also be improved, but the occurrence of advantageous 
mutations is obviously rare. Finally, low effect SNPs is substan- 
tially synonymous changes (synonymous start, nonsynon- 
ymous start, start gained, synonymous coding, and 
synonymous stop). The percentage of high, moderate, and 
low effect SNPs in the three genomes always follows the 
same pattern: moderate effect substitutions are the most 
common (%Mod*, table 4), low effects have an intermediate 
proportion (%Low*, table 4) and finally, as expected, high 
effect SNPs are the rarest (%Hi*, table 4). The abundance 
of moderate effects in respect to low effects is more 
marked in F and M (figure 6e and f; table 4), compared 
with Fm. This could be the result of the larger number of 
replications of germ line mtDNAs (Fm is not inherited so it 
does not undergo the same rounds of replication of the 
other mtDNAs): nonsynonymous mutations are more fre- 
quent and, if buffered by functional copies, their effect is 
small and they are not purged by selection. High-effect sub- 
stitutions are more dangerous, therefore more subject to se- 
lection and for this reason are the rarest class. 

We performed the same analysis also on monoallelic SNPs: 
due to the lack of buffering effect, selection is more effective 
on nonsynonymous mutations and this is reflected by the per- 
centages of high, moderate and low effect substitutions. 



Indeed, in monoallelic SNPs the most common class is the 
low-effect followed by moderate and high (% Low, % 
Mod, and % Hi; table 4), that is, synonymous substitutions 
are the most common. Compared with polyallelic SNPs, both 
high- and moderate-effect classes drop their percentages in 
monoallelic SNPs (fig. 6g and h; table 4). 

Overall, our data are consistent with a lower amount of 
deleterious polymorphism in M-type in comparison with F 
(fig. 6d and g), that can be explained by a different efficiency 
of selection on gametes. The presence of hundreds of F 
mtDNAs in eggs entails a strong buffering effect on deleteri- 
ous mutations which are complemented by wild-type alleles. 
R. philippinarum spermatozoa carry only four mitochondria 
(Milani et al. 201 1), corresponding to a few dozen mtDNAs 
(Ghiselli et al. 201 1 ), thus the buffering effect is much weaker, 
and deleterious mutations are more exposed to selection. 
After spawning, M-type mitochondria are subject to an in- 
tense selection since only the most viable spermatozoa can 
fertilize an egg and produce a healthy embryo. This leads to an 
unusual situation in which a smaller population size results in a 
more efficient selection. 

Conclusions 

The high amount of URs in the fast-evolving mtDNAs of bi- 
valves seems at first to elude the evolutionary pressure to- 
wards a reduction of the genome size. The main causes for 
the origin of extragenic sequences in mtDNAs are slipped- 
strand mispairing, errors in termination of replication (Boore 
2000) and recombination (Ladoukakis 201 1 and references 
therein). The extraordinary variability in gene arrangement 
and the presence of gene duplications suggest that such 
mechanisms are particularly active in bivalves, and the ele- 
vated mutation rate plus the low efficiency of the DNA mis- 
match repair system could be the underlying reasons. 
Although the origin of intergenic DNA is due to completely 
stochastic processes, its persistence is probably adaptive: the 
presence of sequences, motifs, secondary structures with a 
regulatory role, and transcribed ORFs with a still unknown 
function, can prevent the loss through GRE. Under this 
view, the redundancy generated by duplications and/or acqui- 
sition of extra sequences allowed the evolution (gain-of -func- 
tion) of mitochondrially encoded factors possibly interacting 
with the extramitochondrial environment and the nucleus by 
means of retrograde signaling. In DUI bivalves, these factors 
could be responsible for the unusual inheritance system and 
for the different transcriptional behavior of the two organelle 
lineages. 

It has been established that, among the fast-evolving bival- 
vian mtDNAs, M-type of DUI species is the fastest. From our 
data, it is clear that M and F are actually pretty close as far as 
the amount of polymorphism goes. This means that the 
higher evolutionary rate of M is not caused by the higher 
polymorphism in germ line mitochondria. If M existence is 
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just the effect of the acquired ability to invade male germ line, 
M would have to carry out its biological functions only when F 
cannot do it (i.e., in male gonad). Following this rationale, 
some Authors proposed that M faster evolutionary rate 
could be explained by a relaxed selection due to the reduced 
biological role (Zouros 201 2). Even if this is true, we argue that 
the remaining function of M-type is an extremely important 
one (i.e., the contribution to gamete production and function- 
ality), and a relaxed selection would affect gamete fitness. 
Indeed, even a modest reduction of energy production by 
mitochondria is known to reduce male fertility, and a decrease 
in male fitness reduces the viability of the population 
(Gemmell and Allendorf 2001; Meiklejohn et al. 2007). 
From our data on SNP effects, M has the lowest proportion 
of nonsynonymous polymorphism (fig. 6), particularly in the 
high-effect class and in monoallelic SNPs, and this is not in 
agreement with a relaxed selection. An alternative scenario 
would be that M has a function in sperm and/or spermato- 
genesis: sex and reproduction-related genes evolve rapidly 
(Ellegren and Parsch 2007; Parsch and Ellegren 2013) and a 
co-evolution between nuclear and mitochondrial factors in- 
volved in spermatogenesis could be the engine of M-type 
mtDNA fast evolution. Another hypothesis to explain M evo- 
lutionary rate is sperm competition, a particularly strong phe- 
nomenon in broadcast spawning animals (Palumbi 2009). DUI 
is the only known biological system in which a mtDNA can be 
under selection for male functions. In species with strict ma- 
ternal inheritance of mitochondria, deleterious mutations that 
affect only males are not subject to natural selection (Gemmell 
and Allendorf 2001; Gemmell et al. 2004), so mtDNA muta- 
tions can reduce male fertility without effects on females. On 
the contrary, in DUI species natural selection can work on M 
mtDNA and this could increase male fitness and be beneficial 
for the entire species. From this point of view, the high pro- 
portion of intermediate-frequency alleles in M can be seen as a 
good predictor of its evolutionary potential: rare alleles do not 
contribute to the immediate response to selection, but inter- 
mediate-frequency alleles do (Allendorf 1986). Under this 
light, even if DUI arose for nonadaptive reasons, its mainte- 
nance would be selectively advantageous. 

Supplementary Material 

Supplementary figures S1-S19 and tables S1-S15 are avail- 
able at Genome Biology and Evolution online (http:/A/vww. 
gbe.oxfordjournals.org/). 
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